Why Discretization Works for Naive Bayesian Classifiers

نویسندگان

  • Chun-Nan Hsu
  • Hung-Ju Huang
  • Tzu-Tsung Wong
چکیده

This paper explains why well-known dis-cretization methods, such as entropy-based and ten-bin, work well for naive Bayesian classiiers with continuous variables, regardless of their complexities. These methods usually assume that discretized variables have Dirichlet priors. Since perfect aggrega-tion holds for Dirichlets, we can show that, generally, a wide variety of discretization methods can perform well with insigniicant diierence. We identify situations where dis-cretization may cause performance degradation and show that they are unlikely to happen for well-known methods. We empirically test our explanation with synthesized and real data sets and obtain connrming results. Our analysis leads to a lazy discretiza-tion method that can simplify the training for naive Bayes. This new method can perform as well as well-known methods in our experiment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)

Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...

متن کامل

On Why Discretization Works for Naive-Bayes Classifiers

We investigate why discretization is effective in naive-Bayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naiveBayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functions were employed. We discuss the factors that might affect naive-Bayes classification error under ...

متن کامل

Augmented Naive Bayesian Classifiers for Mixed-Mode Data

Conventional Bayesian networks often require discretization of continuous variables prior to learning. It is important to investigate Bayesian networks allowing mixed-mode data, in order to better represent data distributions as well as to avoid the overfitting problem. However, this attempt imposes potential restrictions to a network construction algorithm, since certain dependency has not bee...

متن کامل

Bayesian network classifiers which perform well with continuous attributes: Flexible classifiers

When modelling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous works have solved the problem by discretizing them with the consequent loss of information. Another common alternative assumes that the data are generated by a Gaussian distribution (parametric approach), such as conditional Gaussian networks, wit...

متن کامل

Learning Dynamic Naive Bayesian Classifiers

Hidden Markov models are a powerful technique to model and classify temporal sequences, such as in speech and gesture recognition. However, defining these models is still an art: the designer has to establish by trial and error the number of hidden states, the relevant observations, etc. We propose an extension of hidden Markov models, called dynamic naive Bayesian classifiers, and a methodolog...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000